[parser.c] Various string unescape optimizations #920

samyron · 2025-12-31T04:00:15Z

This PR contains 4 commits which optimize json_string_unescape. Admittedly, I almost didn't create this PR as I don't love the Apple Aarch64-specific code... however I decided to leave it up to the maintainers of this library to decide if this code is worth accepting.

I omitted the SWAR / ARM Neon unless the code is being compiled on Apple platforms as a quick benchmark on my x86 laptop with the SWAR find_backslash and custom json_memcpy code showed it was no faster master. Looking at the memcpy and memchr on Ubuntu showed a AVX optimized libc. At the time the code was structured a bit differently and it was only copying sizeof(uint64_t) bytes at a time on x86. I could try implementing find_backslash and json_memcpy using SSE2 if desired. I should note that on memchr on macOS does seem to be optimized using ARM Neon instructions. Looking at the assembly, the implementation is similar though not quite the same. I think the performance improvement comes from reduced function call overhead / branching as find_backslashes ends up inlined.

With respect to keeping track of the additional_backslashes instead of has_more, if the string to unescape ends with a long run of characters that do not have any more backslashes, we can save a memchr and directly copy to the end of the string.

I'm cool if you decide not to accept any of these and simply close this PR. These are just some random things I've been testing out.

Benchmarks

These were run on an Macbook Air M1.

With only the `addtional_backslashes` commit:

The benchmark:

benchmark_parsing 'a bunch of escapes followed by a long string without escapes', JSON.generate("some\nescapes\twith\"a long\"string after\fso\twe\ncan test\nno\ffinal\tmemchr" * 4 + "this is some longish string! " * 512)

== Parsing a bunch of escapes followed by a long string without escapes (15162 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    33.105k i/100ms
Calculating -------------------------------------
               after    345.312k (± 2.3%) i/s    (2.90 μs/i) -      1.755M in   5.083936s

Comparison:
              before:   307715.4 i/s
               after:   345311.9 i/s - 1.12x  faster

`additional_backslashes` and `find_backslashes`:

== Parsing a bunch of escapes followed by a long string without escapes (15162 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    32.551k i/100ms
Calculating -------------------------------------
               after    351.976k (± 2.6%) i/s    (2.84 μs/i) -      1.790M in   5.089949s

Comparison:
              before:   307223.8 i/s
               after:   351976.5 i/s - 1.15x  faster


== Parsing activitypub.json (58160 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.094k i/100ms
Calculating -------------------------------------
               after     10.824k (± 0.8%) i/s   (92.39 μs/i) -     54.700k in   5.054133s

Comparison:
              before:    10415.9 i/s
               after:    10823.5 i/s - 1.04x  faster

`additional_backslashes`, `find_backslashes`, `json_memcpy`:

== Parsing a bunch of escapes followed by a long string without escapes (15162 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after    32.384k i/100ms
Calculating -------------------------------------
               after    342.436k (± 3.2%) i/s    (2.92 μs/i) -      1.716M in   5.017785s

Comparison:
              before:   288084.1 i/s
               after:   342436.2 i/s - 1.19x  faster


== Parsing activitypub.json (58160 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.128k i/100ms
Calculating -------------------------------------
               after     11.272k (± 0.7%) i/s   (88.71 μs/i) -     56.400k in   5.003607s

Comparison:
              before:    10586.3 i/s
               after:    11272.4 i/s - 1.06x  faster

`additional_backslashes`, `find_backslashes`, `json_memcpy`, `unescape_unicode`:

== Parsing activitypub.json (58160 bytes)
ruby 3.4.8 (2025-12-17 revision 995b59f666) +YJIT +PRISM [arm64-darwin24]
Warming up --------------------------------------
               after     1.037k i/100ms
Calculating -------------------------------------
               after     11.368k (± 0.6%) i/s   (87.96 μs/i) -     57.035k in   5.017163s

Comparison:
              before:    10600.4 i/s
               after:    11368.4 i/s - 1.07x  faster

…tra memchr searching the remaining characters when no more backslashes exist.

…AArch64 architecture.

…arch64 platforms.

byroot · 2025-12-31T09:44:42Z

ext/json/ext/parser/parser.c

+    signed char b0 = digit_values[p[0]];
+    signed char b1 = digit_values[p[1]];
+    signed char b2 = digit_values[p[2]];
+    signed char b3 = digit_values[p[3]];


I think there is a risk of reading past the end of the buffer here.

Prior digit_values[p[0]] < 0 would return if p[0] is a NULL byte.

I think this optimization does make sense, but we need to ensure we can actually read 4 bytes.

Actually, I checked all the callsites, they all already ensure that.

So I think I'm good with that optimization, except I think I'd like to refactor the bound check inside the function.

byroot · 2025-12-31T09:51:39Z

as I don't love the Apple Aarch64-specific code... however I decided to leave it up to the maintainers of this library to decide if this code is worth accepting.

To be frank with you, my opinion on the matter is that Apple hardware is overwhelmingly used in development, almost never in production, so it's not really worth taking on extra complexity just for that environment, unless of course the gain is massive.

I do however like your two optimizations that aren't specific to macOS+aarch64.

byroot · 2025-12-31T10:07:42Z

Alright, I cherry picked d21d936 and 976ba36 on master.

Thank you for the PR!

samyron · 2025-12-31T16:18:59Z

as I don't love the Apple Aarch64-specific code... however I decided to leave it up to the maintainers of this library to decide if this code is worth accepting.

To be frank with you, my opinion on the matter is that Apple hardware is overwhelmingly used in development, almost never in production, so it's not really worth taking on extra complexity just for that environment, unless of course the gain is massive.

+1. I support this decision.

Thanks again for your stewardship of this gem!

samyron added 4 commits December 30, 2025 20:48

Keep track of the the number of additional backslashes to avoid an ex…

542c636

…tra memchr searching the remaining characters when no more backslashes exist.

Use a custom find_backslash instead of memchr on Apple machines with …

0334772

…AArch64 architecture.

Use a custom memcpy implementation when unescaping strings on Apple A…

546048d

…arch64 platforms.

Simplify unescape_unicode

c25c300

byroot reviewed Dec 31, 2025

View reviewed changes

byroot closed this Dec 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[parser.c] Various string unescape optimizations #920

[parser.c] Various string unescape optimizations #920

samyron commented Dec 31, 2025 •

edited

Loading

Uh oh!

byroot Dec 31, 2025

Uh oh!

byroot Dec 31, 2025

Uh oh!

byroot commented Dec 31, 2025

Uh oh!

byroot commented Dec 31, 2025

Uh oh!

samyron commented Dec 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[parser.c] Various string unescape optimizations #920

[parser.c] Various string unescape optimizations #920

Conversation

samyron commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

With only the addtional_backslashes commit:

additional_backslashes and find_backslashes:

additional_backslashes, find_backslashes, json_memcpy:

additional_backslashes, find_backslashes, json_memcpy, unescape_unicode:

Uh oh!

byroot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

byroot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

byroot commented Dec 31, 2025

Uh oh!

byroot commented Dec 31, 2025

Uh oh!

samyron commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samyron commented Dec 31, 2025 •

edited

Loading

With only the `addtional_backslashes` commit:

`additional_backslashes` and `find_backslashes`:

`additional_backslashes`, `find_backslashes`, `json_memcpy`:

`additional_backslashes`, `find_backslashes`, `json_memcpy`, `unescape_unicode`:

samyron commented Dec 31, 2025 •

edited

Loading